Human Dominant Disease Genes Are Enriched in Paralogs Originating from Whole Genome Duplication
نویسندگان
چکیده
PLOS Computational Biology recently published an article by Chen, Zhao, van Noort, and Bork [1] reporting that, in contrast to duplicated nondisease genes, human monogenic disease (MD) genes are (1) enriched in duplicates (in agreement with earlier reports [2–5]) and (2) more functionally similar to their closest paralogs based on sequence conservation and expression profile similarity. Chen et al. then proposed that human MD genes frequently have functionally redundant paralogs that can mask the phenotypic effects of deleterious mutations. We would like to point out here two lines of evidence that appear more relevant to the explanation of this surprising enrichment of human disease genes in duplicates. The first line of evidence indicates that human gene duplicates should be distinguished depending on whether they originate from smallscale duplication (SSD) or from the two rounds of whole genome duplication (WGD) that occurred in early vertebrates some 500 million years ago. In fact, as shown quantitatively below using Chen et al.’s dataset, human MD genes are actually depleted, not enriched, in SSD duplicates, whereas they are clearly enriched in WGD duplicates when compared to nondisease genes. This opposite retention pattern cannot be explained by a selection mechanism independent of the SSD or WGD origin of MD gene duplicates. The second line of evidence concerns the mode of inheritance of human MDs, which provides a more stringent criterion than sequence conservation or coexpression profile to assess the likelihood of functional compensation by paralogs of MD genes. In particular, the recessiveness of a human disease is expected to be a prerequisite for functional compensation by a paralog gene. Indeed, autosomal dominant MDs are unlikely to experience significant functional compensation from a different locus, since even a perfectly functional allele is unable to mask the deleterious phenotypic effects of a dominant allelic mutant on the same heterozygote locus. We first address the difference between SSD duplicates and WGD duplicates, also called ‘‘ohnologs’’ after Susumu Ohno’s early ‘‘2R hypothesis’’ [6], which has now been firmly established [7]. The importance of distinguishing between SSD and WGD duplicates in the human genome has already been reported in a number of papers [2–4,8], including our own [5,9]. As shown in Figure 1A, human genes tend to partition into three main gene categories with respect to duplicates: those with WGD but no SSD duplicates (about 28%), those with SSD but no WGD duplicates (about 41%), and singletons without WGD or SSD duplicates (about 24%), while human genes with both WGD and SSD duplicates are relatively rare (about 7%). Gene families enriched either in WGD or SSD duplicates also correspond to distinct functional classes [2,8], with WGD genes frequently involved in signaling, regulation, and development, whereas SSD genes are typically implicated in different functions such as antigen processing, immune response, and metabolism. In addition, human disease genes have been shown to be significantly enriched in WGD duplicates, while they are rather depleted in SSD duplicates [2,5,8,9]. This could not be seen with Chen et al.’s dataset, which lumps together all gene duplicates irrespective of their WGD or SSD origin. In fact, using the same monogenic disease (MD) dataset, we could readily extend these earlier results, as depicted in Figure 1B. MD genes are significantly enriched in ohnologs, 38.3% versus 27.7% (p = 1.58610; Fisher’s Exact [FE] test), while showing at the same time a significant depletion in both singletons, 16.5% versus 23.7% (p = 7.67610; FE test), and SSD, 36.1% versus 41.6% (p = 2.75610; FE test). MD genes are more specifically depleted in recent SSD, 9.2% versus 17.3% (p = 4.1610; FE test), while WGD-old and older SSD of MD genes are not significantly biased, i.e., 9.9% versus 9.2% (p = 0.12; FE test) and 17% versus 15.5% (p = 0.001; FE test), respectively (see below). These results demonstrate that, although MD genes retain significantly more duplicates than singletons (Figure 1B), these duplicates are primarily enriched in ohnologs and not SSD copies, as compared to the relative WGD and SSD content of the entire human genome (Figure 1A, Dataset S1). To explain the global enrichment in MD gene duplicates, Chen et al. noticed that coexpressions between MDs and their closest paralogs are in general higher than that of nondisease genes (p = 0.00298, Figure 2B in [1]), which they interpret as evidence that ‘‘functional compensation by duplication of genes masks the phenotypic effects of deleterious mutations and reduces the probability of purging the defective genes from the human population.’’ In particular, the retention of MD gene duplicates should be favored by the higher functional redundancy of recent, less-diverged duplicates. However, investigating the age of SSD duplicates from MD genes suggests rather the opposite, as MD genes tend to have fewer recent SSD than old SSD duplicates, as compared to nondisease (ND) genes (Figures 1A and B). In particular, focusing on genes with SSD but no ohnolog, we found that 9.2% [respectively 17%] of MD genes have SSD that are more recent [respectively ancient] than the two rounds of whole-
منابع مشابه
Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes
Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which...
متن کاملA Minimal Set of Glycolytic Genes Reveals Strong Redundancies in Saccharomyces cerevisiae Central Metabolism
As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplicatio...
متن کاملFunctional analysis of gene duplications in Saccharomyces cerevisiae.
Gene duplication can occur on two scales: whole-genome duplications (WGD) and smaller-scale duplications (SSD) involving individual genes or genomic segments. Duplication may result in functionally redundant genes or diverge in function through neofunctionalization or subfunctionalization. The effect of duplication scale on functional evolution has not yet been explored, probably due to the lac...
متن کاملGene loss and evolutionary rates following whole-genome duplication in teleost fishes.
Teleost fishes provide the first unambiguous support for ancient whole-genome duplication in an animal lineage. Studies in yeast or plants have shown that the effects of such duplications can be mediated by a complex pattern of gene retention and changes in evolutionary pressure. To explore such patterns in fishes, we have determined by phylogenetic analysis the evolutionary origin of 675 Tetra...
متن کاملOrigin and evolution of GATA2a and GATA2b in teleosts: insights from tongue sole, Cynoglossus semilaevis
Background. Following the two rounds of whole-genome duplication that occurred during deuterostome evolution, a third genome duplication occurred in the lineage of teleost fish and is considered to be responsible for much of the biological diversification within the lineage. GATA2, a member of GATA family of transcription factors, is an important regulator of gene expression in hematopoietic ce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2014